Configure Druid SIP Connectivity

To establish the connection between your AI Agent and your telephony system:

Step 1. Set up Druid SIP general settings

In the Druid Portal, navigate to your AI Agent and select the Channels tab.
Search for 'sip' and click on the Druid SIP card.

In the configuration modal, General tab, configure the following parameters:

Setting	Description	Mandatory
SIP FQDN	The SIP FQDN of the voice AI Agent. Currently supported protocols:TLS, TCP and UDP, on port 5060. Info: Copy the SIP FQDN as you will need it to connect the telephony system to Druid.
Enable mTLS	Turn this toggle on to enforce Mutual Transport Layer Security (mTLS) for SIP connections, which requires a valid client certificate for authentication. When enabled, connecting clients must present a valid client certificate during the TLS handshake. Only certificates issued by internationally trusted Certificate Authorities (CAs) are supported. Self-signed certificates and certificates issued by private or internal CAs are not supported. Info: This setting is available starting with Druid version 9.24.
Connection Types	Select the transport protocol(s) used for SIP signaling. At least one connection type must be enabled; you cannot disable all of them.
Allowed IPs	Configure the specific signaling addresses from your SIP provider that Druid SIP will accept traffic from. Info: You may define up to 10 addresses. To add an address entry: Click the Add button to add a new row. Select the address format from the Type dropdown: Simple. Use for a single, specific host address. Example: 192.168.1.50 or 109.166.157.238. Wildcard: Use to match multiple IP addresses within a octet using an asterisk (). Example: 192.168.1.* (matches any address from 192.168.1.0 through 192.168.1.255).* CIDR: Use for standard subnet blocks to define an entire network range using a routing prefix. Example: 192.168.1.0/24 (includes all 256 addresses in that subnet) or 54.172.60.0/22. Range: Use when you need to specify an explicit starting and ending point for an authorized block of IP addresses. Example: 192.168.1.10 - 192.168.1.25. Enter the network address details in the IP address field. Select the transport Protocol and specify the Port if your setup requires fixed routing parameters. Save the record. Use the Edit or Delete icons in the Actions column to manage existing configuration entries.
Phone numbers	The DID phone numbers assigned to the AI Agent. Enter each number on a new line. IMPORTANT! This is the identifier of the voice AI Agent and must be written in the exact same format as it's being sent through the SIP Invite event, by the contact center solution (for example, any E.164 format: +1 321 111 222 3333).	Yes

Now configure the call settings.

Step 2. Configure the call settings

Click the Call Settings tab and configure the following parameters:

Setting	Description	Mandatory	Default value
Enable Audio Recordings	Captures and stores the audio recordings of the voice interactions handled by the SIP trunk.	No	true
DTMF buffer timeout (ms)	The time to wait for the next touch-tone digit before processing the input.	Yes	3000ms
DTMF buffer terminator	The DTMF character used to indicate the end of user input. Leave the field empty to use only the DTMF timeout or the buffer length.	No	#
DTMF max buffer length	The maximum number of digits allowed before the input is automatically sent.	Yes	1
DTMF duplicate filter (ms)	The duration in milliseconds used to ignore accidental duplicate touch-tone inputs.	Yes	20s

Configure the desired speech services following the instructions in the subsequent section.

Step 3. Configure the speech services

Define the speech services that Druid SIP will use to process and synthesize voice interactions.

Configure the voice recognition (STT) settings

In the Speech-to-Text (STT) section, configure the following settings:

From the Speech-to-Text Provider dropdown, select the STT service provider. Additional settings appear specific to the selected provider.
Fill in the details specific to the selected provider:

Druid. Enter the connection details you received from Druid.

Azure. Enter your Subscription key and the Region identifier. Take your region identifier from the Microsoft documentation.

ElevenLabs. Enter your ElevenLabs API key.

Mistral. Enter your Mistral AI API key. The Base URL and the Voxtral speech model are automatically filled in.

Deepgram. Enter your Deepgram API key. The Base URL and the speech model are automatically filled in.

Soniox. Enter your Soniox API key. The Base URL and the speech model are automatically filled in.

Configure how the system detects speech, manages background noise, and allows the user to speak over the AI Agent:

Setting

Description

Mandatory

Default value

Enable VAD

Enables Voice Activity Detection to identify when a user starts or stops speaking.

Yes

true

Speech threshold

The sensitivity level (0 to 1.0) for detecting speech. Higher values require louder input.

Yes

0.5s

Enable barge-in

Allows the user to interrupt the AI Agent while it is speaking.

true

Barge-in consecutive frames

The number of consecutive audio frames required to trigger a barge-in interruption. Adjusting this value allows you to balance the AI Agent responsiveness against the risk of false triggers from background noise.

Default Value: 3 frames (during tech preview).

A lower frame count makes the AI Agent more sensitive to interruptions, while a higher frame count requires a longer, sustained sound to trigger a response.

Use the table below to estimate the impact of your frame count on the user experience:

Audio Unit	Definition	Approximate Time Frame	Equivalent
Frame	Smallest packet of processed audio	20ms	1 frame
Syllable	Linguistic unit of speech	200ms	~10 frames
Word	Complete semantic unit	500ms	~25 frames

If using Druid or ElevenLabs, turn on the Use External VAD toggle to use the STT provider's Voice Activity Detection (VAD) to identify when speech begins and ends.

NOTE: Keep Use External VAD disabled for Azure Speech Services as VAD is managed server-side.

The Language detection mode controls when and how often the system analyzes the incoming audio to identify the spoken language. Select the desired option:

None: No automatic language detection is performed.
At Start: The language is identified only at the very beginning of the interaction, then the language returned by the AI Agent is used.
Continuous: Language detection is performed continuously throughout the conversation.

Configure the detection timeouts:

Timeout	Description
Silence duration (ms)	The amount of silence required after a user stops speaking before the system considers the utterance finished and begins processing. Increase this value if users are frequently cut off mid-sentence; decrease it to make the AI Agent more responsive.
Conversation Idle Timeout Seconds	The duration of total inactivity allowed before the call is automatically disconnected. Default value: 120s.

Now configure STT.

Configure the voice synthesis (TTS) settings

In the Text-to-Speech (TTS) section, configure the voice synthesis settings. You can use different TTS service providers for different AI Agent languages.

To add a TTS provider for a specific language:

Click the Add language button.
In the Language field, enter the standard Druid language code (for example, ro for Romanian) that matches either the default or an additional language configured for your AI Agent. For more information, see Druid Supported Languages.
From the Text-to-Speech Provider dropdown, select the desired TTS service provider. Additional settings appear specific to the selected provider.
Fill in the details specific to the selected provider:

Druid. Enter the details you received from your Druid representative. In the Language field, enter the Druid language code, as listed here.
Azure. Enter your Subscription key and the Region identifier. Take your region identifier from the Microsoft documentation.

In the Synthesis voice field, enter the specific voice the AI Agent will use to respond. Take the voice identifier from the Microsoft documentation.

ElevenLabs. Enter your ElevenLabs API key, the Voice ID and Model ID to be used.
Mistral. Enter you Mistral AI API key. In the Mistral Voice Id field, enter a valid voice ID from your Mistral account. The selected voice determines the characteristics of the synthesized audio, such as accent, tone, and speaking style.

Info: To obtain a voice ID, create or select a voice in your Mistral account and copy its identifier from the Mistral Voices API or voice management interface.

The Base URL and the Voxtral speech model are automatically filled in.

Deepgram. Enter your Deepgram API key. The Base URL and the speech model are automatically filled in. In the Deepgram Model Id field, you can enter the specific Deepgram voice model the AI Agent will use to respond. For the list of available voice models and voice name, see Deepgram documentation.
Soniox. Enter your Soniox API key and in the Soniox Voice Id field, enter the specific voice the AI Agent will use to respond. Use the voiceId of the chosen Soniox voice. The Base URL and the speech model are automatically filled in.

Save the configuration.

Context Parameters

After the Druid SIP configuration activates, the system automatically provisions specific [[ChatUser]] parameters within the conversation context. The following parameter is initialized by default:

[[ChatUser]].ChannelId = "druid-sip"

Step 4. Connect your telephony system to Druid

Configure your telephony provider to use the SIP FQDN as the primary routing domain. You can find this value in the Druid SIP Configuration modal.

Testing with MicroSIP

For testing purposes, follow these steps to configure MicroSIP:

Download and install MicroSIP from the official website.
Run the application as Administrator.

Set up an account:

Click the Actions icon (the downward arrow) and select Add Account.

Complete the following fields in the Account modal:

Field	Description
Account Name	Enter a name for your reference.
SIP Server	Enter the SIP FQDN copied from the Druid Portal.
SIP Proxy	Enter the SIP FQDN copied from the Druid Portal.
Username	Enter your designated SIP username.
Domain	Enter the details you received from Druid.
Login
Password
Media Encryption	Set to Disabled. NOTE: Encryption is currently unsupported.
Transport	Select TLS, TCP or UDP. Druid communicates with telephony systems on port 5060.

Click Save to activate the account.